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Abstract 

We extract the frequency content of a noisy signal by use of Dis- 
crete Fourier Transform. Our analysis overcomes the limitations im- 
posed by incommensurate lattices. After computing the deterministic 
component, we show the relevance of the method in removing spurious 
autocorrelations from the signal residuals. Results are presented for a 
temperature time series. 
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1 Introduction 

The computation of the autocorrelation function of a noisy signal usually re- 
quires the removal of a deterministic component often called trend or seasonal 
component. Motivated by the important problem of the statistical analysis 
of temperature data, we concentrate on the case of noisy periodic signals. 
For this particular application, we need to carry out the following steps, (i) 
First, we evaluate the frequencies of the embedded periodic components, (ii) 
Then, we fully identify the deterministic component by a variational princi- 
ple restricted to the class of functions consistent with the results of part (i). 
(iii) Finally, the computation of the autocorrelation function of the residuals 
completes the analysis of the signal. We discuss the theoretical underpin- 
nings of such a method in the special case of a signal with a single periodic 
component, and we demonstrate that this program is often spoiled by the 
subtle consequences of the possible incommensurability of the sampling fre- 
quency and the intrinsic frequency of the signal in question. See for example 
PP for a discussion of the sampling theory of continuous signals. This paper 
quantifies one form of this undesirable effect, and proposes a remedy to the 
resulting ambiguity in the determination of the intrinsic frequency of a noisy 
periodic signal. 

2 Fourier Spectrum 

To establish notation, we begin by giving the explicit formula for the Discrete 
Fourier Transform (DFT for short) of a given vector Xj of length N |2j: 




fc = l, ••• ,N. 



(1) 



With this definition of the Fourier transform, it follows that: 





because of the orthogonality property 




(3) 
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Here and in the following, 5 nk is the Kronecker delta. 
Let us consider a monochromatic signal: 

Xj = Ae~ 2m ^, j = l,...,N, (4) 

where m is a given positive integer between 1 and N. For each integer 
1 < k < N we have: 



X k = AVN e~ 2m — 6 km} (5) 

and hence: 

\X k \ = \A\VN 5 km . (6) 

In other words, the spectrum results in all coefficients being zero apart from 
a peak for the value of k equal to m. From the spectrum we can compute 
the frequency of the periodic signal given the length N (i.e., u = !2 ^r"). A 
difficulty arises when the monochromatic signal is of the form: 

Xj =Ae-^^ } j = l,...,N, ~\<e<\. (7) 

In this case, the frequency cannot be expressed as a ratio of the form (m — 
1)/N and we say that we are facing an incommensurate lattice problem. 
Computing the spectrum, we get: 



\X k \ = \A\' — ^-fr-r^ (g) 
1111 VN\j 1-cos27t^^ V ; 

It is now evident that an incommensurate frequency produces a spread of 
the peak (i.e., X k ^ for k ^ m). This is illustrated in Figure [TJ It is worth 
noting that for e — > and N — > oo, we obtain 

sin7r(m — k) 



\X k \ w \A\ y/N 



Tt(m — k) 



1 + 



7rcot7r(m — k) — 



+0(N 



-3/2> 



m — k 

(9) 

We recognize the zero order term as the Fraunhofer diffraction pattern for 
the single slit [Sj. This should not be surprising as the spread of the Fourier 
spectrum X k is due to the interaction of two wavelengths; one for the lattice 
and the other for the signal. 

The previous analysis shows that the Fourier spectrum is widened even 
in the case of a noise free signal. In the case of a noisy signal the detection 
of the frequency is complicated by the spectrum asymmetries introduced by 
the noise. 
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Figure 1: The spectrum \Xk\ as a function of k and e. \A\ — 1, N — 100 and 
m = 50. The solid lines are a guide to the eye. 



3 The Case of Temperature Data 

We now analyze the average temperature data for the case of Seattle- Tacoma 
international Airport from 1/1/1960 to 12/31/1999 (i.e., 14,610 entries). 
First, we remove the mean value of 52.20 Fahrenheit. Next, we determine 
the frequency of the embedded periodic signal by use of the Fourier paradigm. 
For any subset of the data, the Fourier spectrum gives a peak over a noisy 
background. However, each peak gives different estimates of the period, all 
inconsistent with the naive idea that the more points the better the estimate 
of the period. In Figure El we show the value of the estimated period A as a 
function of N. The A's were computed according to the following prescrip- 
tions. For each A" < 14, 610 we selected the first A" points of the data set 
and computed the period as 

A = ]-—v (10) 

A, max 1 



where fc max is the index for which the absolute value of the spectrum is maxi- 
mum. We always chose the integer k max to be in the range below the Nyquist 
frequency to avoid aliasing artifacts |3]. The uncertainty is dramatic. Even 
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Figure 2: The period A as a function of the number of points N. 

with as many points as 7, 920 — 8, 140 (more than 20 times the 'true' pe- 
riod corresponding to the tropical year), the period takes values in the range 
360 — 370 days. In experimental situations where the value is not foreseeable 
from the very beginning, such uncertainty would be disastrous for estimating 
statistical properties such as variance and autocorrelation function. Let us 
suppose that we are given only the first Ni = 7, 920 data points. We would 
find a peak at fc max = 23, which corresponds to Ai = 360. If we had the first 
N 2 = 8, 140 data, we would find a period of A 2 = 370, even with the same 
fcmax = 23. In order to fully identify the deterministic component dj we use 
the ansatz 



where Q is the number of harmonics used in the estimation. In each case 
we minimize the least square distance between the signal diminished by its 
mean and the expression in (jllj) . For Q = 3, we obtain: 




Q r 



(11) 



Oi b\ a 2 b 2 a 3 b 3 



Ai = 360 
A 2 = 370 



-2.02 -10.58 -0.69 -0.51 -0.02 -0.27 
-9.54 5.22 1.18 -0.18 0.07 0.17 
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Figure 3: Autocorrelation function as a function of the lag. 



The variances of the residuals are a\ = 49.17 and of = 47.33 for Ai and 
A 2 . By using the tropical year estimate (A tr = 365.2422), we get af r = 26.65 
for N = 7,920, and of r = 26.51 for N = 8, 140. It is evident that a wrong 
value of the period gives rise to an overestimated variance of the residuals. 
In Figure El we show how dramatic the effect on the autocorrelation function 
is when a wrong period is used. Not only the residuals have an enlarged 
variance but they also show a spurious persistence. In the case of temperature 
time series, a large persistence would imply the possibility of forecasting the 
weather beyond any reasonable range. 

As discussed above, the determination of the 'true' frequency is paramount 
for an effective analysis of statistical properties of a signal. Here we provide a 
solution to this problem by analyzing the functional dependence of the peaks 
appearing in Figure El More than providing the solution, we want to stress 
that any time the DFT is performed on a finite sample data set a figure like 
Figure El should be generated. One should not seek the largest possible data 
set, but rather study the dependence of the estimated period as a function of 
the number of points. Our ansatz is that both the max and the min 'peaks' 
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of each piecewise linear segment in Figure |2] lie on a curve of the type: 

Ax 



A 



max, mm 



[x) 



B + x 



;i2) 



By least square minimization, we find: 





A 


B 


A m i n 
A ma x 


365.70 
366.19 


194.26 
-165.67 



In both cases the R 2 is 1 apart from round off errors. This means that the 
function in (JT2~j) is not just a good guess, but rather the 'true' decay of the 
estimated period to its large N limit (i.e., A). The results in the above table 
show that the estimates are fairly close to the 'true' period. Furthermore, the 
variance and the autocorrelation function are very close to those computed 
for the tropical year value, 365.2422. To be more specific, the variance for 
N = 8, 140 is 26.79 and 27.54 for A min and A max , respectively. In both cases, 
the autocorrelation function cannot be distinguished from the one obtained 
after using A tr . 

It is worth pointing out that equation (jHJ) is of some help for understand- 
ing a more fundamental reason for the functional dependence in equation 
()12|) . Adding a point to the data set is equivalent to changing the lattice 
or changing e = e(iV). The jumps appearing in Figure El will occur when e 
exceeds |, which occurs approximately after adding ~ A points. By locating 
the minima and maxima in this way, one can obtain an expression for the 
decay of the form (fT2"j) . 



4 Conclusion 

In conclusion, we have shown how to compute the autocorrelation function 
of noisy periodic signals in the case of a single frequency mode. Our scheme 
improves upon a naive application of the DFT. The main point is that more 
is not better when it comes to the DFT. The dependence on the number of 
points for the estimated frequency is more important than the position of the 
peak for a specific N. Our solution stems from building envelopes of minima 
and maxima of piecewise linear functions. This can certainly be improved; 
however there is no question that another solution has to be found by looking 
at the results in Figure 121 
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Finally, our approach is applicable to signals with more than one fre- 
quency mode. This will be the subject of a future work. 
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